194 ◾ Bioinformatics
Figure 5.19 shows that samples are clustered into tumor and normal samples based on the
profiles of the genes in these samples.
5.3.7.8 Model Fitting
Once we have computed dispersion estimates, we can use them to fit the negative binomial
generalized linear model, and then we can carry out the testing procedures for determin-
ing the differential expression. EdgeR has two functions to fit the RNA-Seq count data
to the GLMs: the “glmQLFit” function which fits the data to a quasi-likelihood negative
binomial generalized log-linear model and the “glmFit” function which fits the data to
a negative binomial generalized log-linear model. The difference between the two GLM
functions is that “glmQLFit” uses the trended negative binomial dispersion for fitting and
then estimates the quasi-likelihood dispersion from the deviance, while “glmFit” uses the
tagwise negative binomial dispersion for model fitting. You can use any one of them to
fit the count data. Run the following to fit the count data to the quasi-likelihood negative
binomial model:
fitq <- glmQLFit(yNorm, design)
names(fitq)
FIGURE 5.19 Heatmap clustering samples and top 10 variable genes.